In any given randomization, treatment mean and control mean are likely \(\neq\) the true means of \(Y(1)\) and \(Y(0)\)…
We want to know:
Following evidence of effects of soap opera in Rwanda:
Variation on Rwandan study, in Eastern DRC.
“I would not like that group to belong to my community association”; (1 = totally disagree; 4 = totally agree)
| \(Region_i\) | \(Y_i(1)\) | \(Y_i(0)\) |
|---|---|---|
| 1 | 3 | 2 |
| 2 | 4 | 4 |
| 3 | 4 | 2 |
| 4 | 2 | 3 |
| 5 | 2 | 4 |
| 6 | 4 | 1 |
We set 3 regions in treatment (soap opera + talk show)
We set 3 regions in control (soap opera only)
How many possible random assignments are there?
R| 1 | 2 | 3 |
| 1 | 2 | 4 |
| 1 | 2 | 5 |
| 1 | 2 | 6 |
| 1 | 3 | 4 |
| 1 | 3 | 5 |
| 1 | 3 | 6 |
| 1 | 4 | 5 |
| 1 | 4 | 6 |
| 1 | 5 | 6 |
| 2 | 3 | 4 |
| 2 | 3 | 5 |
| 2 | 3 | 6 |
| 2 | 4 | 5 |
| 2 | 4 | 6 |
| 2 | 5 | 6 |
| 3 | 4 | 5 |
| 3 | 4 | 6 |
| 3 | 5 | 6 |
| 4 | 5 | 6 |
For each randomization, calculate the \(\widehat{ACE}\)
| \(Region_i\) | \(Y_i(1)\) | \(Y_i(0)\) |
|---|---|---|
| 1 | 3 | 2 |
| 2 | 4 | 4 |
| 3 | 4 | 2 |
| 4 | 2 | 3 |
| 5 | 2 | 4 |
| 6 | 4 | 1 |
What is the mean \(\widehat{ACE}\)?
How does it compare to the \(ACE\)?
Are there any \(\widehat{ACE} = ACE\)
Let’s check our work:
require(data.table) #data.table function
p_o_table = data.table(region_i = 1:6,
y_i_1 = c(3,4,4,2,2,4),
y_i_0 = c(2,4,2,3,4,1)
)
p_o_table$tau_i = p_o_table$y_i_1 - p_o_table$y_i_0
p_o_table[, tau_i := y_i_1 - y_i_0]
#ACE
ace = mean(p_o_table$tau_i)
ace## [1] 0.5
Let’s check our work:
randomizations = combn(6,3,simplify = T) %>% t
t_means = apply(randomizations, 1,
function(x)
mean(p_o_table[region_i %in% x, y_i_1])
)
c_means = apply(randomizations, 1,
function(x)
mean(p_o_table[!(region_i %in% x), y_i_0])
)Let’s check our work:
t_means## [1] 3.666667 3.000000 3.000000 3.666667 3.000000 3.000000 3.666667
## [8] 2.333333 3.000000 3.000000 3.333333 3.333333 4.000000 2.666667
## [15] 3.333333 3.333333 2.666667 3.333333 3.333333 2.666667
c_means## [1] 2.666667 2.333333 2.000000 3.000000 3.000000 2.666667 3.666667
## [8] 2.333333 3.333333 3.000000 2.333333 2.000000 3.000000 1.666667
## [15] 2.666667 2.333333 2.333333 3.333333 3.000000 2.666667
Let’s check our work:
#Average Causal Effects (hat)
aces = t_means - c_means
#Expected value of the ACE (hat)
e_ace_hat = mean(aces)
e_ace_hat## [1] 0.5
#ACE
ace## [1] 0.5
Let’s check our work:
Sample Difference in Means in unbiased
Histogram is the exact sampling distribution of the \(\widehat{ACE}\) in this experiment
But we never observe this histogram
Analytic/Asymptotic approach
Randomization inference
Bootstrap
First: we want to get variance of \(\widehat{ACE}\)
\[Var[X - Y] = Var[X] + Var[Y] - 2 \cdot Cov[X,Y]\]
What is \(Var[Y^T - Y^C] = Var[\widehat{ACE}]\)?
Variances of Treatment/Control Group Means
if we assume i.i.d. draws from the study group (is this correct?)
\[Var[Y^T] = \frac{Var[Y_i(1)]}{m}\]
Variance of sampling distribution of the treatment-group mean is variance of all potential outcomes under treatment divided by the treatment group size
Variance of potential outcomes under treatment:
\[Var[Y_i(1)] = \frac{1}{N}\sum\limits_{i=1}^{N} \left( Y_i^1 - \frac{\sum\limits_{i=1}^{N} Y_i(1)}{N} \right) ^2\]
This is a parameter, often denoted \(\sigma^2\)
\[Var[Y^T] = \frac{\sigma^2}{m}\]
We don’t know \(\sigma^2\), we need to estimate it from our sample
Like sample mean, sample variance is an unbiased estimator of population variance:
\[\widehat{Var[Y_i(1)]} = \widehat{\sigma^2} = \frac{1}{m-1}\sum\limits_{i=1}^{m}[Y_i(1) | Z_i = 1] - Y^T)^2\]
Why is sample variance biased if we divide by \(m\) (instead of \(m-1\))?
mean of data minimizes the sum of squared errors
If the sample mean \(\hat\mu\) \(\neq\) population mean \(\mu\), then \(\left[ \sum\limits_{i = 1}^{m} [x_i - \hat\mu]^2 \right] < \left[ \sum\limits_{i = 1}^{m} [x_i - \mu]^2 \right]\)
Uncorrected sample variance \(\widehat{\sigma^2}\) is \(\frac{1}{m} \sum\limits_{i = 1}^{m} [x_i - \hat\mu]^2\).
Then, \(\widehat{\sigma^2} < \sigma^2\) unless sample mean equals population mean
Using this approach:
\[\widehat{Var[Y_i(1)]} = \widehat{\sigma^2} = \frac{1}{m-1}\sum\limits_{i=1}^{m}[Y_i(1) | Z_i = 1] - Y^T)^2\]
\[Var[Y^T] = \frac{\sigma^2}{m}\]
we can estimate \(Var(Y^T)\) and \(Var(Y^C)\).
What else do we need to estimate \(Var[\widehat{ACE}]\)?
We still need \(Cov(Y^T,Y^C)\) to get variance of \(\widehat{ACE}\), because
\(Var[\widehat{ACE}] = Var[Y^T] + Var[Y^C] - 2 Cov[Y^T, Y^C]\)
\[Cov(Y^T,Y^C) = -\frac{1}{N(N-1)}\sum\limits_{i=1}^{N} \left( Y_i(1) - \frac{\sum\limits_{i=1}^{N} Y_i(1)}{N} \right) \cdot \left(Y_i(0) - \frac{\sum\limits_{i=1}^{N} Y_i(0)}{N} \right)\]
Can’t estimate the covariance because we don’t see both potential outcomes for each case!
We can ignore the covariance (deflates variance) safely, because
Variances we obtain with \(\widehat{Var}[\widehat{ACE}] = \widehat{Var}[Y^T] + \widehat{Var}[Y^C]\) are going to be:
We’ve been trying to estimate the variance of the \(\widehat{ACE}\).
Variance is not usually what we want
Per the CLT: the sampling distributions of sums of random variables (and by extension, their means) approach the normal distribution as the \(N \rightarrow\infty\).
Using this fact; estimated sample mean and variance of the sample mean:
This approximation performs well, but depends on sample size and population distribution.
If the population looks like this:
Predict the shape of sampling distribution for \((n= 5\))
The shape of sampling distribution for \((n= 5\))
If the population looks like this:
If the population looks like this:
Predict the shape of sampling distribution for \((n= 5\))
The shape of sampling distribution for \((n= 5\)) is
The sampling distibution for \(n = 25\) is:
The sampling distribution for \((n= 100\)) is
Does normality hold in our experiment?
Does normality hold in our experiment?
Does normality hold in our experiment?
We run an experiment on our 6 regions, and observe \(\widehat{ACE} = 0.667\)
The hypothesis test investigates: what is probability of observing a value this large or larger if the true \(ACE = 0\)
If distributional assumptions are wrong, hypothesis test will not be correct
Alternatives:
bootstrap
randomization inference
Unlike analytical approach:
Tests a different null
Usually null is that the average effect is \(0\) (some units could have positive or negative effects).
\[\frac{1}{N}\sum\limits_{i=1}^{N} \tau_i = ACE = 0\] Randomization inference tests the sharp null hypothesis
\[\tau_i = 0 \ \ \ \forall \ \ \ (i\ \in N)\] that every unit treatment effect is \(0\).
Advantages:
Disadvantages
In practice:
| region_i | y_i_1 | y_i_0 | tau_i |
|---|---|---|---|
| 1 | 3 | 2 | 1 |
| 2 | 4 | 4 | 0 |
| 3 | 4 | 2 | 2 |
| 4 | 2 | 3 | -1 |
| 5 | 2 | 4 | -2 |
| 6 | 4 | 1 | 3 |
Once we have this table of potential outcomes, we:
## Warning in estate(y, Z): Probabilities not specified. Assuming equal
## probabilities.
## Warning in gendist(Ys, perms): Generating probabilities from permutation
## matrix.
In R:
#install.packages('ri')
require(ri)